Dynamic hard error detection

ABSTRACT

A method of testing a circuit includes halting a flow of normal data through the circuit, running test data through the circuit while subjecting the circuit to a stress condition, and determining whether a hard error exists in the circuit based on the running of the test data.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.13/765,320, filed Feb. 12, 2013, the disclosure of which is incorporatedby reference herein in its entirety.

BACKGROUND

Physical structures on computer chips degrade with age.Electromigration, thermal stress and other conditions that increase orself-perpetuate over time may result in hard errors in logic, latches,registers or arrays on the chip, such as stuck-at faults or delays. Inconventional chips, detecting hard errors is costly, because it requireseither detection logic added to the chip, requiring chip area and power,or it consumes resources resulting in degraded performance orenergy-consumption of the chip.

In addition, when a chip is manufactured, the chip may be tested todetermine acceptable ranges of operation, then a margin may be added toone acceptable operating level and the chip may be programmed ordesigned to operate outside the margin to account for anticipateddegradation of chip performance during the life of a chip. For example,a chip may be designed to operate below its optimal frequency based onestimates that over time the optimal frequency will drift downward.

SUMMARY

Embodiments include an apparatus for detecting hard errors in a circuit.The apparatus may include storage and a processing circuit. The storagehas stored therein test data and normal data. The processing circuitincludes combinational logic in series with at least one set of inputlatches and at least one set of output latches. The apparatus includes atest control module configured to control the processing circuit to halta flow of normal data through the processing circuit and run the testdata through the processing circuit while subjecting the processingcircuit to a stress condition.

Additional embodiments include a computer program product for testing acircuit. The computer program product includes a tangiblecomputer-readable storage medium having stored thereon a computerprogram for performing a method. The method includes halting a flow ofnormal data through the circuit, running test data through the circuitwhile subjecting the circuit to a stress condition, and determiningwhether a hard error exists in the circuit based on the running of thetest data.

Further embodiments include a method including halting a flow of normaldata through a circuit, running test data through the circuit whilesubjecting the circuit to a stress condition, and determining whether ahard error exists in the circuit based on the running of the test data.

Additional features and advantages are realized by implementation ofembodiments of the present disclosure. Other embodiments and aspects ofthe present disclosure are described in detail herein and are considereda part of the claimed invention. For a better understanding of theembodiments, including advantages and other features, refer to thedescription and to the drawings.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The subject matter which is regarded embodiments of the presentdisclosure is particularly pointed out and distinctly claimed in theclaims at the conclusion of the specification. The forgoing and otherfeatures, and advantages of the embodiments are apparent from thefollowing detailed description taken in conjunction with theaccompanying drawings in which:

FIG. 1 illustrates a system for testing a circuit according to anembodiment of the disclosure;

FIG. 2 illustrates a system for testing combinational logic according toone embodiment;

FIG. 3 illustrates a flowchart of a method for testing a circuit; and

FIG. 4 illustrates a computer system according to one embodiment.

DETAILED DESCRIPTION

The performance of conventional chips or circuits may degrade over timeresulting in hard errors, such as stuck-at or delay errors in thecircuitry. Embodiments of the disclosure relate to providing in-linetesting of a chip or circuit over the life of the chip or circuit.

FIG. 1 illustrates a system 100 according to an embodiment of thedisclosure. The system 100 includes a computing system 110 and mayinclude an external test control module 140. The computing system 110may be a computer, including a processing circuit 112 and storage 119within one housing or outer frame, or the computing system 110 may bedistributed among multiple housings and structures. The test controlmodule 140 may include a device or computer or may include a computerprogram executed by one or more computers or systems to control testingof the computing system 110.

The computing system 110 may include the processing circuit 112, storage119, system clock generator 126, system power module 128 and coolingsystem 130. The processing circuit 112 may include one or moreprocessors having one or more processing cores, control and processinglogic, and any other hardware for performing processing of data. Theprocessing circuit 112 may include normal operation circuitry 116 andtest circuitry 114. The normal operation circuitry 116 may include, forexample, input and output latches in series with combinational logiccircuitry 117, such that data is input into input latches, from theinput latches to the combinational logic, and from the combinationallogic to output latches. The normal operation circuitry 116 may alsoinclude registers and arrays 118 for data storage, and particularly forshort-term storage of data as the data is being processed by theprocessing circuit 112. The normal operation circuitry 116 may bedefined as circuitry that is used to process data during normaloperation of the processing circuit 112. The test circuitry 114 mayinclude logic and other circuitry for providing test data to the latchesand other hardware of the normal operation circuitry.

Throughout the present specification and claims, “normal operation” maybe defined as operation of the computing system 110 to perform theregular functions of the computing system 110 and not to performtest-specific operations on the hardware or software of the computingsystem 110 to test for hard errors. Similarly, an environment in whichthe processing circuit 112 operates normally is defined as anenvironment in which the processing circuit 112 performs normal andnon-testing-specific functions of the computing system 110.Testing-specific functions are defined as functions in which normal datais replaced by testing data or normally-used circuitry is replaced withtesting circuitry.

Storage may store normal data 120, test data 122 and a test program 124.In the present specification and claims, “normal data” is defined asdata generated or used during normal operation of the computing system110 and not during testing to test the functionality of the computingsystem. “Test data” 122 is defined as data that is provided to a circuitin the place of normal data to test the functionality of hardware orsoftware of the computing system 110.

The test program 124 may be provided in addition to, or alternativelyto, the test control module 140. In other words, in one embodiment,testing of the processing circuit 112 is controlled by an externaldevice, such as the test control module 140, and in an alternativeembodiment, testing of the processing circuit 112 is controlled by thetest program 124 internal to the computing system 110. In yet anotheralternative embodiment, some portions of the testing of the processingcircuit 112 may be performed by the test control module 140 and otherportions may be performed by the test program 124. In yet anotherembodiment, the test circuitry 114 is capable of performing a build-inself-test independently of the test program 124 and the test controlmodule 140. In other words, the processing circuit 112 may perform anentirely self-generated and self-contained built-in self-test.

In one embodiment, the processing circuit 112 is tested by a built-inself-test (BIST), such as the test program 124 or other BIST utilizingthe test data 122. The BIST may be configured to test the integrity ofthe latches and logic 117, the registers and arrays 118 and any otherhardware or software of the processing circuit 112. In embodiments ofthe present disclosure, one or both of the test control module 140 andthe test program 124 may test the processing circuit 112 inpredetermined environmental conditions. Examples of environmentalconditions include performing the BIST at a predetermined frequency,voltage or temperature. In addition, a volume of data provided to theprocessing circuit 112 may be controlled.

In particular, in embodiments of the present disclosure one or both ofthe test control module 140 and the test program 124 may set theenvironmental conditions at levels to stress the processing circuit 112.The levels may correspond to conditions in which the processing circuit112 operates under normal conditions, or more extreme conditions thanthe processing circuit operates under normal conditions.

In one embodiment, one of the test control module 140 and the testprogram 124 controls the system clock generator 126 to increase anoperating frequency of the processing circuit 112 relative to a normaloperating frequency. In another embodiment, one of the test controlmodule 140 and the test program 124 controls the system power module 128to increase or decrease a voltage of signals, such as data signals,provided to the processing circuit 112, or to increase or decrease avoltage supplied to run components of the processing circuit 112, suchas latches and combinational logic.

In another embodiment, one of the test control module 140 and the testprogram 124 controls the processing circuit 112 to access an increasedvolume of data relative to normal operation of the processing circuit112 that stresses the noise margin on the voltage supply of the circuit.Although a few examples of environmental conditions have been providedto illustrate stresses to a processing circuit 112, embodiments of thepresent disclosure encompass any environmental stresses that may beapplied to a processing circuit 112 from a source within the computingsystem 110 or external to the computing system 110.

FIG. 2 illustrates an example of a system 200 for testing a logiccircuit 220, such as the latches and logic 117 of FIG. 1. The system 200includes the logic circuit 220 and storage 210 for storing checkpointednormal data in a checkpoint 212 and also for storing test data 214. Thelogic circuit 220 includes a set of input latches 222, combinationallogic 224 and a set of output latches 226. The sets of input and outputlatches 222 and 226 may be of any size. In addition, while only one setof input latches 222, one combinational logic block 224 and one set ofoutput latches 226 is illustrated in FIG. 2, the logic circuit 220 maytypically include multiple combinational logic blocks arranged in seriesbetween latches. For example, the set of output latches 226 may act as aset of input latches for a next combinational logic block in series, andany number of combinational logic blocks may be connected in seriesbracketed by input and output latches.

During normal operation, data to be operated on by the combinationallogic 224 is provided from an external source via input/output (I/O)connections, via a data bus, via storage (such as storage 119 of FIG.1), or via any other source or combination of sources. The normal datais held in the set of input latches 222, and when latched, the set ofinput latches 222 outputs the normal data to the combinational logic224. The combinational logic outputs the normal data having beensubjected to logical functions to the output latches 226. Upon assertingthe latch, the normal data is output to a next combinational logic blockor to an output bus, storage, I/O connections, or any other output andnew normal data is provided from the set of input latches 222 to thecombinational logic 224.

Embodiments of the present disclosure include detecting hard errors inthe set of input latches 222, the set of output latches 226 and thecombinational logic 224. The hard errors may correspond to stuck statesof the latches or logic or any other type of hard error. The hard errorsare detected by testing the system 200 while subjecting the system to apredetermined stress condition, such as a temperature, frequency orvoltage or by adjusting a volume of data provided to the system 200 perunit of time.

The system 200 may be tested by storing the data from the set of inputlatches 222 and the set of output latches 226 in the checkpoint 212 ofstorage to store the operational state of the logic circuit 220. Inother words, the system 200 may operate normally for a predeterminedperiod of time, and then the normal operation may be halted to performthe hard error test. The states of the sets of input and output latches222 and 226 may be stored by reading the outputs from the latches to thecombinational logic, by dedicated data lines from the latches to thestorage 210 or by any other method.

Once the states of the sets of input and output latches 222 and 226 arestored in the checkpoint 212, test data 214 from the storage 210 may beprovided to the set of input latches 222. Although FIG. 2 illustratestest data 214 provided from storage 210, embodiments of the disclosureencompass test data 214 provided from any source, such as internal orexternal devices to a system, computer or assembly of which the logiccircuit 220 is a part. The test data 214 may be cycled through the setof input latches 222, modified by the combinational logic 224 and cycledthrough the set of output latches 226. The process may be repeated forany number of combinational logic blocks. Data output from the sets ofinput and output latches 222 and 226 may be provided to storage 210 andto a test analysis circuit (such as the test control module 140 or testprogram 124 of FIG. 1). The test data results may be analyzed to detecthard errors in the sets of latches 222 and 226 and the combinationallogic 224.

In addition to reading data output from the sets of latches 222 and 226,data may be read out from the output of the combinational logic 224 toisolate errors in the combinational logic 224. For example, it may bedetermined that no error exists at the output of the set of inputlatches 222 but an error exists at the output of the combinational logic224, and a hard error may accordingly be identified within thecombinational logic 224. In addition, while FIG. 2 illustrates test dataprovided to the set of input latches 222, test data may be provided toany number of latches. For example, test data may be provided to boththe set of input latches 222 and the set of output latches 226, or testdata may be provided only to the set of input latches 222 and the testdata may be cycled through the set of input latches 222 to thecombinational logic 224 and the output latches 226. The output latches226 may be latched to provide the test data to a next set ofcombinational logic and output latches, and so forth.

Once the test is complete, if the number of hard errors is below apredetermined threshold, the normal data may be restored to the sets ofinput and output latches 222 and 226, and the system 200 may continue tooperate normally.

FIG. 3 illustrates a process according to an embodiment of thedisclosure. In block 302, normal operation of a circuit is halted.“Normal operation” refers to processing data to perform the functions ofthe circuit, and does not include performing testing of the circuit forhard errors. In block 304, the normal data is stored in a checkpoint. Inblock 306, a stress condition is set. The stress condition maycorrespond to an actual or simulated environmental condition.

For example, if the circuit operates normally in an environment having arange of temperatures, the stress condition may correspond to the rangeof temperatures, or temperatures just outside the range of temperatures.Alternatively, if the circuit operates normally at a predeterminedfrequency, then the frequency of the circuit may be set at thepredetermined frequency to test the circuit under its normal operatingconditions, or the frequency of the circuit may be increased ordecreased from the predetermined frequency to provide additional stresslevels to the circuit. In yet another example, if the circuit operatesat a predetermined operating voltage, then the voltage of the circuitduring testing may be set at the predetermined operating voltage or maybe increased or decreased to provide additional stress to the circuit.While a few examples have been provided, any type of stress condition,including any actual or simulated environmental condition, may be set toprovide a predetermined stress to the circuit.

In block 308, a circuit test is performed. The circuit test may be, forexample, a BIST of latches, logic, registers and/or arrays of aprocessing circuit to detect hard errors in the processing circuit. Thecircuit test may include replacing the normal data in the circuit,having been stored in the checkpoint, with test data and performing testoperations, such as cycling latches and logic and performing read andwrite operations on registers and arrays.

In block 310 it is determined if the circuit test is successful. If thecircuit test is successful, such as if the number of hard errorsdetected is less than a predetermined threshold, then the test outcomemay be reported as “OK” in block 312. In addition, the normal data maybe restored to the circuit in block 314 and normal operation of thecircuit may be resumed in block 315. In embodiments of the presentdisclosure, the testing of a circuit may be performed in-line, or whilethe circuit is connected in its normal operating environment instead ofonly after the circuit is manufactured or is in a non-normal operatingenvironment. In addition, when a circuit is tested outside of its normaloperating environment, stress conditions may be set to simulateconditions in the normal operating environment. In addition, when thecircuit is tested in the normal operating environment, additional stressconditions may be set to test operating characteristics of the circuitoutside, or at the extremes, of normal operating conditions.

If, in block 310, it is determined that the test is not successful, inblock 316 it may be determined whether the test should include a searchfor a new error-free operating point, where the operating pointcorresponds to conditions under which the circuit may operate. Forexample, if the circuit is tested in a first test conditioncorresponding to a first temperature, voltage or frequency, it may bedetermined in block 316 whether the test is set up to find anothertemperature, voltage or frequency at which the circuit may operate. Thedecision of block 316 may correspond to comparing the set stresscondition with ideal operating conditions, with operating conditionlimits (such as a maximum or minimum possible operating temperature,voltage or frequency) or any other threshold stress condition.

If it is determined in block 316 that no further search for anerror-free point should be performed, the core or chip including thecircuit maybe shut down in block 326, the normal data from thecheckpoint may be moved to another core or chip in block 328 and normaloperation may proceed on the other core or chip based on the storednormal data from the tested circuit.

On the other hand, if it is determined in block 316 that a search shouldbe made for a new error-free operating point, then it may be determinedin block 318 if a maximum number of iterations of the test has beenreached. If not, then in block 320 a stress level may be reduced 320,such as by adjusting a temperature, voltage or frequency of the testedcircuit, and the test is repeated in block 308.

However, if it is determined in block 318 that the maximum number ofiterations has been reached, then a test error may be reported in block322 and the last error-free conditions may be reported. In block 324, itis determined whether the tested circuit should continue normaloperation. For example, it may be determined whether a number ofdetected hard errors is less than a predetermined threshold, whetherdetected hard errors would adversely affect operation of the circuit orany other consideration. If it is determined that normal operationshould continue, then the test may be reported as “OK” in block 312, thestored normal data may be restored in block 314, and normal operationmay resume in the circuit in block 315.

On the other hand, if it is determined in block 324 that normaloperation should not resume, the core or chip associated with the testedcircuit may be shut down in block 326, the stored normal data of thetested circuit may be moved to another core or chip in block 328 andnormal operation of the data may resume on the new core or chip in block315.

FIG. 3 illustrates one example of a method for testing a circuitaccording to embodiments of the present disclosure. However, it isunderstood that embodiments of the disclosure encompass processesincluding additional operations or having operations omitted relative toFIG. 3. In other words, embodiments of the disclosure encompass anyprocess for testing a circuit having previously been operating in anormal operating mode and applying stress conditions to the circuitduring testing.

Embodiments may be implemented in any computer system, device, chip orcircuit. For example, in one embodiment a logic circuit (such as thelogic circuit 100 of FIG. 1) or the system for testing the logic circuit(such as the system 200 of FIG. 2) may be implemented in an activememory cube (AMC) device. An AMC may be a storage device that containsmultiple layers of addressable memory elements. The memory may bedivided into memory vaults, or three-dimensional blocked regions of thememory cube which share a common memory controller or processingelement, and are capable of servicing memory access requests to theirdomain of memory independently of one another. In one embodiment, eachseparate memory controller or processing element may include a logiccircuit or system for testing the logic circuit according to embodimentsof the present disclosure. An active memory cube, or active bufferedmemory, is described in further detail in U.S. application Ser. No.13/566,019, the contents of which are hereby incorporated by referencein their entirety.

FIG. 4 illustrates a block diagram of a computer system 400 according toanother embodiment of the present disclosure. The methods describedherein can be implemented in hardware, software (e.g., firmware), or acombination thereof. In one embodiment, the methods described herein areimplemented in hardware as part of the microprocessor of a special orgeneral-purpose digital computer, such as a personal computer,workstation, minicomputer, or mainframe computer. The system 400therefore may include general-purpose computer or mainframe 401 capabletesting a reliability of a base program by gradually increasing aworkload of the base program over time.

In one embodiment, in terms of hardware architecture, as shown in FIG.4, the computer 401 includes a one or more processors 405, memory 410coupled to a memory controller 415, and one or more input and/or output(I/O) devices 440, 445 (or peripherals) that are communicatively coupledvia a local input/output controller 435. The input/output controller 435can be, for example, one or more buses or other wired or wirelessconnections, as is known in the art. The input/output controller 435 mayhave additional elements, which are omitted for simplicity indescription, such as controllers, buffers (caches), drivers, repeaters,and receivers, to enable communications. Further, the local interfacemay include address, control, and/or data connections to enableappropriate communications among the aforementioned components. Theinput/output controller 435 may include a plurality of sub-channelsconfigured to access the output devices 440 and 445. The sub-channelsmay include, for example, fiber-optic communications ports. Theinput/output controller 435 may also transmit and receive data to/froman external computer-readable storage medium 447.

The processor 405 is a hardware device for executing software,particularly that stored in storage 420, such as cache storage, ormemory 410. The processor 405 can be any custom made or commerciallyavailable processor, a central processing unit (CPU), an auxiliaryprocessor among several processors associated with the computer 401, asemiconductor based microprocessor (in the form of a microchip or chipset), a macroprocessor, or generally any device for executinginstructions.

The memory 410 can include any one or combination of volatile memoryelements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM,etc.)) and nonvolatile memory elements (e.g., ROM, erasable programmableread only memory (EPROM), electronically erasable programmable read onlymemory (EEPROM), programmable read only memory (PROM), tape, compactdisc read only memory (CD-ROM), disk, diskette, cartridge, cassette orthe like, etc.). Moreover, the memory 410 may incorporate electronic,magnetic, optical, and/or other types of storage media. Note that thememory 410 can have a distributed architecture, where various componentsare situated remote from one another, but can be accessed by theprocessor 405.

The instructions in memory 410 may include one or more separateprograms, each of which comprises an ordered listing of executableinstructions for implementing logical functions. In the example of FIG.4, the instructions in the memory 410 include a suitable operatingsystem (O/S) 411. The operating system 411 essentially controls theexecution of other computer programs and provides scheduling,input-output control, file and data management, memory management, andcommunication control and related services.

In one embodiment, a conventional keyboard 450 and mouse 455 can becoupled to the input/output controller 435. Other output devices such asthe I/O devices 440, 445 may include input devices, for example but notlimited to a printer, a scanner, microphone, and the like. Finally, theI/O devices 440, 445 may further include devices that communicate bothinputs and outputs, for instance but not limited to, a network interfacecard (NIC) or modulator/demodulator (for accessing other files, devices,systems, or a network), a radio frequency (RF) or other transceiver, atelephonic interface, a bridge, a router, and the like. The system 400can further include a display controller 425 coupled to a display 430.In an exemplary embodiment, the system 400 can further include a networkinterface 460 for coupling to a network 465. The network 465 can be anIP-based network for communication between the computer 401 and anyexternal server, client and the like via a broadband connection. Thenetwork 465 transmits and receives data between the computer 401 andexternal systems. In an exemplary embodiment, network 465 can be amanaged IP network administered by a service provider. The network 465may be implemented in a wireless fashion, e.g., using wireless protocolsand technologies, such as WiFi, WiMax, etc. The network 465 can also bea packet-switched network such as a local area network, wide areanetwork, metropolitan area network, Internet network, or other similartype of network environment. The network 465 may be a fixed wirelessnetwork, a wireless local area network (LAN), a wireless wide areanetwork (WAN) a personal area network (PAN), a virtual private network(VPN), intranet or other suitable network system and includes equipmentfor receiving and transmitting signals.

When the computer 401 is in operation, the processor 405 is configuredto execute instructions stored within the memory 410, to communicatedata to and from the memory 410, and to generally control operations ofthe computer 401 pursuant to the instructions.

In one embodiment, the methods of managing memory described herein canbe implemented with any or a combination of the following technologies,which are each well known in the art: a discrete logic circuit(s) havinglogic gates for implementing logic functions upon data signals, anapplication specific integrated circuit (ASIC) having appropriatecombinational logic gates, a programmable gate array(s) (PGA), a fieldprogrammable gate array (FPGA), etc.

In one embodiment, the tested circuit may include any one of theprocessor 405, storage 420, network interface 450, memory 410, displaycontroller 425, memory controller 415 and I/O controller 435. The testmay be a built-in self-test and may be carried out by a program storedin memory 410 or storage 420. In another embodiment, the test may beperformed by an external device connected to the computer 401, such asvia the network 465.

As described above, embodiments can be embodied in the form ofcomputer-implemented processes and apparatuses for practicing thoseprocesses. An embodiment may include a computer program product on acomputer readable/usable medium with computer program code logiccontaining instructions embodied in tangible media as an article ofmanufacture. Exemplary articles of manufacture for computerreadable/usable medium may include floppy diskettes, CD-ROMs, harddrives, universal serial bus (USB) flash drives, or any othercomputer-readable storage medium, wherein, when the computer programcode logic is loaded into and executed by a computer, the computerbecomes an apparatus for practicing the embodiments. Embodiments includecomputer program code logic, for example, whether stored in a storagemedium, loaded into and/or executed by a computer, or transmitted oversome transmission medium, such as over electrical wiring or cabling,through fiber optics, or via electromagnetic radiation, wherein, whenthe computer program code logic is loaded into and executed by acomputer, the computer becomes an apparatus for practicing theembodiments. When implemented on a general-purpose microprocessor, thecomputer program code logic segments configure the microprocessor tocreate specific logic circuits.

As will be appreciated by one skilled in the art, aspects of the presentdisclosure may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present disclosure may take theform of an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present disclosure may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent disclosure may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Aspects of the present disclosure are described above with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of thepresent disclosure. It will be understood that each block of theflowchart illustrations and/or block diagrams, and combinations ofblocks in the flowchart illustrations and/or block diagrams, can beimplemented by computer program instructions. These computer programinstructions may be provided to a processor of a general purposecomputer, special purpose computer, or other programmable dataprocessing apparatus to produce a machine, such that the instructions,which execute via the processor of the computer or other programmabledata processing apparatus, create means for implementing thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present disclosure. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

In embodiments of the disclosure, a chip test may be used to detect anystuck-at and delay faults in a chip including latches, combinationallogic, registers and arrays after fabrication of the chip. The chiptesting may be performed at regular intervals throughout the lifetime ofthe chip to detect hard errors as they develop and to identify chipsthat are no longer capable of running error-free within definedoperating ranges or conditions.

In one embodiment, built-in self-test software and hardware may be usedto perform the testing. The testing may be performed for any desiredlength of time and at any desired interval throughout the life of thechip. The continued testing throughout the lifetime of the chip providesfor error detection of patterns that may seldom occur during normalprogram execution. Embodiments of the disclosure encompass performing alogic built-in self-test (LBIST), array built-in self-test (ABIST) orperforming a test with an off-chip test engine.

Embodiments of the present disclosure also provide for operating a chipat a higher frequency throughout the life of the chip by determiningoptimal operating frequencies, not only immediately upon fabrication,but also throughout the life of the chip. For example, instead ofproviding a margin at fabrication to account for the likely degradationof operating frequency of the chip over its lifetime, and operating thechip outside the margin throughout its lifetime, embodiments of thepresent disclosure may reduce or eliminate the margin by determiningoptimal operating frequencies at regular intervals and operating thechip at the optimal frequency or within a predetermined range of theoptimal frequency. Other examples of operating margins include operatingvoltage of a chip, operating temperature of the chip, data volume overtime supplied to the chip, or any other condition.

Embodiments of the disclosure may be implemented to test different typesof logic. For example, in the case of protected logic, error correctioncircuitry (ECC) or parity protection from which hard errors can bedetected may be installed in all arrays, registers or latches that holdan architected state. When an ECC/parity error is detected, a dataoperation may be retried and an error counter may be used to determinewhether the error recurs. If so, then the error may be treated as a harderror. In this embodiment, it may be unnecessary to reload stored dataafter a test is performed, and errors may be detected as the normal datais processed.

In embodiments of the disclosure, all of the processing cores or unitsof a chip may be tested in parallel. An LBIST and an ABIST may beperformed for all logic and latches. If an error is detected, the LBISTor ABIST may be repeated at a different set stress condition, such as adifferent voltage, frequency, temperature or data volume to determine asafe level of operation. Detected errors may be reported and a newallowable or safe condition (e.g. voltage, frequency, temperature) maybe reported or set. It may be determined by an operating system or othertest module whether to shut down a core or chip or to adjust an internalvoltage or frequency or external cooling condition or workload.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention tothe particular embodiments described. As used herein, the singular forms“a”, “an” and “the” are intended to include the plural forms as well,unless the context clearly indicates otherwise. It will be furtherunderstood that the terms “comprises” and/or “comprising,” when used inthis specification, specify the presence of stated features, integers,steps, operations, elements, and/or components, but do not preclude thepresence or addition of one more other features, integers, steps,operations, element components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the present disclosure has been presented for purposes ofillustration and description, but is not intended to be exhaustive orlimited to the disclosed embodiments. Many modifications and variationswill be apparent to those of ordinary skill in the art without departingfrom the scope and spirit of the embodiments of the present disclosure.

While preferred embodiments have been described above, it will beunderstood that those skilled in the art, both now and in the future,may make various improvements and enhancements which fall within thescope of the claims which follow.

What is claimed is:
 1. A method of testing a circuit, comprising:halting a flow of normal data through the circuit; running test datathrough the circuit while subjecting the circuit to a stress condition;and determining whether a hard error exists in the circuit based on therunning of the test data.
 2. The method of claim 1, wherein the circuitincludes combinational logic in series with at least one set of inputlatches and at least one set of output latches.
 3. The method of claim2, wherein halting the flow of normal data through the circuit includesstopping the flow of data into normal data inputs of the at least oneset of input latches; and storing the normal data from the at least oneset of input latches and the at least one set of output latches instorage prior to running the test data through the circuit.
 4. Themethod of claim 3, further comprising: restoring the normal data to theat least one set of input latches and the at least one set of outputlatches based on determining that a number of hard errors in the circuitis less than a predetermined threshold.
 5. The method of claim 2,wherein both of the normal data and the test data are input throughnormal data inputs of the at least one set of input latches and read outfrom normal data outputs of the at least one set of output latches. 6.The method of claim 1, wherein subjecting the circuit to the stresscondition includes running the test data through the circuit in a sameenvironment as the circuit operates normally.
 7. The method of claim 1,wherein subjecting the circuit to the stress condition includessimulating conditions in an environment in which the circuit operatesnormally.
 8. The method of claim 1, wherein the stress conditionincludes at least one of an operating frequency of the circuit, anoperating voltage of the circuit, a temperature of the circuit, and adata density provided to the circuit.
 9. The method of claim 1, whereinrunning the test data through the circuit includes performing a built-inself-test (BIST) of the circuit.